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Abstract 

Aims. We introduce BASE (Bayesian astrometric and spectroscopic exoplanet detection and characterisation tool), a novel program 
for the combined or separate Bayesian analysis of astrometric and radial-velocity measurements of potential exoplanet hosts and 
binary stars. The capabilities of BASE are demonstrated using all publicly available data of the binary Mizar A. 
Methods. With the Bayesian approach to data analysis we can incorporate prior knowledge and draw extensive posterior inferences 
about model parameters and derived quantities. This was implemented in BASE by Markov chain Monte Carlo (MCMC) sampling, 
using a combination of the Metropolis-Hastings, hit-and-run, and parallel-tempering algorithms to explore the whole parameter space. 
Nonconvergence to the posterior was tested by means of the Gelman-Rubin statistic (potential scale reduction). The samples were 
used directly and transformed into marginal densities by means of kernel density estimation, a "smooth" alternative to histograms. 
We derived the relevant observable models from Newton's law of gravitation, showing that the motion of Earth and the target can be 
neglected. 

Results. With our methods we can provide more detailed information about the parameters than a frequentist analysis does. Still, a 
comparison with the Mizar A literature shows that both approaches are compatible within the uncertainties. 

Conclusions. We show that the Bayesian approach to inference has been implemented successfully in BASE, a flexible tool for 
analysing astrometric and radial-velocity data. 

Key words, astrometry - celestial mechanics - methods: data analysis - methods: statistical - techniques: interferometric - techniques: 
radial velocities 



1. Introduction 

The search for extrasolar planets - places where one day hu- 
mankind might find other forms of life in the Universe - has been 
a subject of scientific investigation since the nineteenth century, 
but only became successful in 1992 with the first confirmed 
discovery of an exoplanet orbiting the pulsar PSR B1257+12 
(Wolszczan & Frail 1992). Still, because it is situated in an en- 
vironment hostile to life as we know it, this case has been of 
less relevance to the public than the first detection of a Sun-like 
planet-host star, 51 Pegasi (Mayor & Queloz 1995). Since that 
time, more than 700 extrasolar planet candidates have been un- 
veiled in more than 500 systems, more than 90 of which show 
signs of multiplicity (Schneider 2012). 

Closely related to the detection of extrasolar planets is the 
characterisation of their orbits. Both these tasks now profit from 
the existence of a variety of observational techniques, which we 
briefly sketch in the following. Comprehensive reviews can be 
found in Perry man (2000) and Deeg et al. (2007, chapter 1). 

Direct observational methods refer to the imaging of exo- 
planets (e. g. Levine et al. 2009), which reflect the light of their 
host stars but also emit their own thermal radiation. To overcome 
the major obstacle of the high brightness contrast between planet 
and star, techniques such as coronagraphy (Lyot 1932; Levine 
et al. 2009), angular differential imaging (Marois et al. 2006; 
Vigan et al. 2010), spectral differential imaging (Smith 1987; 



* BASE, the computer program introduced in this article, can be 
downloaded at http : //www. mpia. de/homes/schulze/base .html. 



Vigan et al. 2010), and polarimetric differential imaging (Kuhn 
et al. 2001 ; Adamson et al. 2005) have been invented. Still, imag- 
ing has only revealed few detections and orbit determinations so 
far. 

The most productive methods in terms of the number of de- 
tected and characterised exoplanets are of an indirect nature, ob- 
serving the effects of the planet on other objects or their radia- 
tion. 

Of these, transit photometry (e. g. Charbonneau et al. 2000; 
Seager 2008) is noteworthy because it has helped uncover more 
than 200 exoplanet candidates, plus over 2,000 still unconfirmed 
candidates from the Kepler space mission (Koch et al. 2010): 
small decreases in the apparent visual brightness of a star during 
the primary or secondary eclipse point to the existence of a tran- 
siting companion. These data allow one to determine the planet's 
radius and orbital inclination and may also yield information on 
the planet's own radiation. 

Timing methods include measurements of transit timing 
variations (TTV) and transit duration variations (TDV) (e. g. 
Holman & Murray 2005; Nascimbeni et al. 2011) of either bi- 
naries or stars known to harbour a transiting planet. The method 
used in the first exoplanet detection (Wolszczan & Frail 1992) is 
pulsar timing, which relies on slight anomalies in the exact tim- 
ing of the radio emission of a pulsar and is sensitive to planets in 
the Earth-mass regime. 

Microlensing (Mao & Paczynski 1991; Gould 2009), which 
accounted for about 15 exoplanet candidates, uses the relativis- 
tic curvature of spacetime due to the masses of both a lens star 
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and its potential companion, with the latter causing a change in 
the apparent magnification and thus the observed brightness of a 
background source. 

Perhaps the most well-known technique, and one of those on 
which this article is based, is known as Doppler spectroscopy 
or radial- velocity (RV) measurements (e. g. Mayor & Queloz 
1995; Lovis & Fischer 2010). With more than 500 exoplanet 
candidates, it has been most successful in detecting new exo- 
planets and determining their orbits to date. From a set of high- 
resolution spectra of the target star, a time series of the line- 
of-sight velocity component of the star is deduced. These data 
allow one to determine the orbit in terms of its geometry and 
kinematics in the orbital plane as well as the minimum planet 
mass m p m i n « m p sin i. To derive the actual planet mass m p , 
the inclination i of the orbit plane with respect to the sky plane 
needs to be derived with a different method, e. g. astrometry. The 
RV technique is distance-independent by principle, but signal- 
to-noise requirements do pose constraints on the maximum dis- 
tance to a star. Stellar variability sometimes makes this approach 
difficult because it alters the line shapes and thus mimicks RV 
variations. The signal in stellar RVs caused by a planet in a cir- 
cular orbit has a semi-amplitude of approximately 



K^nipSini-, , (1) 

where m p ,m*,/, G, and a m \ are the masses of planet and host 
star, the orbital inclination, Newton's gravitational constant, and 
the semi-major axis of the planet's orbit relative to the star, re- 
spectively. This approximation holds for m p «: m+, which is 
true in most cases. It should be noted that the sensitivity of the 
RV method decreases towards less inclined (more face-on) or- 
bits, which is an example for the selection effects inherent to any 
planet-detection method. 

Finally, astrometry (AM; e. g. Gatewood et al. 1980; Sozzetti 
2005; Reffert 2009) - on which this work is also based - is the 
oldest observational technique known in astronomy: a stellar po- 
sition is measured with reference to a given point and direction 
on sky. Astrometry can thus be considered as complementary to 
Doppler spectroscopy, which measures the kinematics perpen- 
dicular to the sky plane. In contrast to the latter, AM allows one 
to determine the orientation of the orbital plane relative to the 
sky in terms of its inclination i and the position angle Q. of the 
line of nodes with respect to the meridian of the target. A planet 
in circular orbit around its host star displaces the latter on sky 
with an approximate angular semi-amplitude of 



where d is the distance between the star and the observer. Again, 
this approximation holds for m p « m^. 

Imaging astrometry, in its attempt to reach sufficient presi- 
cion, still faces problems due to various distortion effects. By 
contrast, interferometric astrometry has been used to determine 
the orbits of previously known exoplanets, mainly with the help 
of space-borne telescopes such as Hipparcos or the Hubble 
Space Telescope (HST), which presently still excel their Earth- 
bound competitors (e. g. McArthur et al. 2010). However, instru- 
ments like PRIMA (Delplancke et al. 2000; Delplancke 2008; 
Launhardt et al. 2008) or GRAVITY (Gillessen et al. 2010) at 
the ESO Very Large Telescope Interferometer are promising to 
advance ground-based AM even more in the near future. 

While planet-induced signals in AM and RVs are both ap- 
proximately linear in planetary mass m p , they differ in their de- 
pendence on the orbital semi-major axis a K \ (Eqs. 1 and 2). 



Doppler spectroscopy is more sensitive to smaller orbits (or 
higher orbital frequencies, Eq. 45), while AM favours larger or- 
bital separations, viz. longer periods. 

In this article we introduce BASE, a Bayesian astrometric 
and spectroscopic exoplanet detection and characterisation tool. 
Its goals are to fulfil two major tasks of exoplanet science, 
namely the detection of exoplanets and the characterisation of 
their orbits. BASE has been developed to provide for the first 
time the possibility of an integrated Bayesian analysis of stellar 
astrometric and Doppler-spectroscopic measurements with re- 
spect to their companions' signals, 1 correctly treating the mea- 
surement uncertainties and allowing one to explore the whole 
parameter space without the need for informative prior con- 
straints. Still, users may readily incorporate prior knowledge, 
e. g. from previous analyses with other tools, by means of pri- 
ors on the model parameters. The tool automatically diagnoses 
convergence of its Markov chain Monte Carlo (MCMC) sampler 
to the posterior and regularly outputs status information. For or- 
bit characterisation, BASE delivers important results such as the 
probability densities and correlations of model parameters and 
derived quantities. 

Because published high-precision AM observations of po- 
tential exoplanet host stars are still sparse, we used data of the 
well-known binary Mizar A to demonstrate the capabilities of 
BASE. It is also planned to gain astrophysical insights into exo- 
planet systems using BASE in the near future. 

This article is organised as follows. Section 2 provides an 
overview of the most often-used methods of data analysis, in- 
cluding B ayes' theorem and MCMC as theoretical and im- 
plementational foundations of this work, as well as a deriva- 
tion of the necessary observable models. BASE is described in 
Section 3. In Section 4, the target Mizar A and the data used in 
this article are discussed. Section 5 presents and discusses our 
analysis of Mizar A. Conclusions are drawn in Section 6. 

2. Methods and models 

Data analysis is a type of inductive reasoning in that it infers gen- 
eral rules from specific observational data (e. g. Gregory 2005b). 
These general rules are described by observable models, simply 
called models in the following, which produce theoretical val- 
ues of the observables as a function of parameters. The primary 
tasks of data analysis are listed in the following. 

1. In model selection, the relative probabilities of a set of 
concurrent models {At,}, chosen a priori, are assessed. 
Specifically, exoplanet detection tries to decide the ques- 
tion of whether a certain star is accompanied by a planet or 
not, based on available data. Additionally, model-assessment 
techniques can be used to determine whether the most prob- 
able model describes the data accurately enough. 

2. Parameter estimation aims to determine the parameters of 
a chosen model. This is specifically referred to as exoplanet 
characterisation (or orbit determination) in the present con- 
text. 

3. The purpose of uncertainty estimation is to provide a mea- 
sure of the parameters' uncertainties. 

Although model selection is equally important, in what fol- 
lows we focus entirely on the second and third tasks, viz. pa- 

1 Although most methods described in this introduction apply to any 
kind of companion to a star, we refer here to companions as "exoplan- 
ets", irrespective of whether they are able to sustain hydrogen or deu- 
terium burning. 
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rameter and uncertainty estimation. This is because for a known 
binary system, only one model is appropriate, viz. two bodies or- 
biting each other. Accordingly, BASE can only perform model 
selection when it analyses data from stars for which it is a-priori 
unknown whether a companion exists. 



2. 1 . Likelihoods and frequentist inference 

The well-established, conventional frequentist approach to in- 
ference is touched upon only briefly here. Its name stems from 
the fact that it defines probability as the relative frequency of 
an event. Measurements are regarded as values of random vari- 
ables drawn from an underlying population that is characterised 
by population parameters, e. g. mean and standard deviation in 
the case of a normal (Gaussian) population. In the following, we 
derive the joint probability density of the values of AM and RV 
data, known as the likelihood X, which plays a central role in 
frequentist inference. 

When combining data of different types, one should gener- 
ally be aware that potential systematic errors may differ between 
the data sets, e. g. due to a calibration error in one instrument 
that renders the data inconsistent with each other. In this case, 
each data set analysed separately would imply a different result. 
In other instances, systematic errors in one set do not affect the 
other data: for example, any constant offset in radial velocity is 
absorbed into parameter V, which is irrelevant to the analysis of 
astrometric data. In the following derivation, we assumed that no 
systematic effects are present that led to inconsistent data. 

In the following, we assumed that the error e, of any datum y, 
is statistically independent of those of all other data and consists 
of two components, each distributed according to a (uni- or bi- 
variate) normal distribution with zero mean: 

- a component €qj corresponding to a nominal measurement 
error, whose distribution is characterised by the covariance 
matrix Eo,, or standard deviation cr 0> , given with the datum, 
for AM and RV data respectively, and 

- a component e + , representing e. g. instrumental, atmospheric 
or stellar effects not modelled otherwise, whose distribu- 
tion is characterised by scalar covariance matrix E + = 
diag(T 2 ,r;) - assuming no correlation between the noise 
in the two AM components - or variance cr 2 , respectively, 
where t + and cr + are free noise-model parameters. 

The AM data covariance matrix Eo,, of datum i, representing the 
uncertainty of and correlation between the two components mea- 
sured, can be written using singular-value decomposition as 





b 2 



(3) 



where R(-), bi and fa are the 2 x 2 passive rotation matrix, the 
nominal semi-major and -minor axes of the uncertainty ellipse 
and the position angle of its major axis, respectively. 

Using characteristic functions, it is readily shown that the 
sum e, = eb,,- + e+j of the two independent error components is 
again normally distributed, with zero mean and covariance ma- 
trix E; = Eo, ( - + E + or standard deviation cr ( - = (cr 2 . + cr 2 ) , 
respectively. 



The probability density 2 of the values of A^am two- 
dimensional AM data {r,} and Af RV RV data {v,}, known as the 
likelihood X, is then given by 



X - Xam Xrv, 



(4) 



where Xam and Xrv are the likelihoods pertaining to the indi- 
vidual data types, 



Xam - 



Xrv 
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(6) 



Furthermore, the sums of squares x\ M and;^ 2 v are defined by 

Nam 



X 2 am = ^(ri-mt^Ej'iri-mtd), 
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(8) 



where r„ v, are the z'th AM and RV datum, r(-; •), v(-; •) the AM 
and RV model functions and is the vector of model parameters. 
The relevant models are derived in Section 2.3. 



Parameter estimation. Frequentist parameter estimation is 
generally equivalent to maximising X or minimising x 2 as func- 
tions of 0. The resulting best estimates of the parameters are 
therefore often called maximum-likelihood or least-squares esti- 
mates. For linear models, x 2 (0) 1S a quadratic function and con- 
sequently can be found unambiguously by matrix inversion. In 
the more realistic cases of nonlinear models, however, x 2 may 
have many local minima, therefore care needs to be taken not 
to mistake a local minimum for the global one. Several meth- 
ods exist to this end, including evaluation of x 2 (0) on a finite 
grid, simulated annealing or genetic algorithms (e. g. Gregory 
2005b). 

Uncertainty estimation. Frequentist parameter uncertainties 
are usually quoted as confidence intervals. Procedures to derive 
these are designed such that when repeated many times based 
on different data, a certain fraction of the resulting intervals will 
contain the true parameters. Popular methods use bootstrapping 
(Efron & Tibshirani 1993) or the Fischer information matrix, 
which is based on a local linearisation of the model (e. g. Ford 
2004). However, these methods suffer from specific caveats: the 
Fischer matrix is only appropriate for a quadratic-shaped x 2 in 
the vicinity of the minimum, and bootstrapping, which relies on 
modified data, may lead to severe misestimation of the parameter 
uncertainties, especially when these are large (Vogt et al. 2005). 



2.2. Bayesian inference 

Bayesian inference (e. g. Sivia 2006), which has gained popular- 
ity in various scientific disciplines during the past few decades, 



2 Throughout, we use the term probability density wherever it refers 
to a continuous quantity, as opposed to probability for discrete quanti- 
ties. Probability distribution, denoted by p(-), is a generic term used for 
both cases. 
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defines probability as the degree of belief in a certain hypothe- 
sis fi. While this is sometimes criticised as leading to subjec- 
tive assignments of probabilities, Bayesian probabilities are not 
subjective if they are based on all relevant knowledge "7C, hence 
different people having the same knowledge will assign them the 
same value (e. g. Sivia 2006). Thus, Bayesian probabilities are 
conditional on the knowledge "7C, and this conditionality should 
be stated explicitly, as in the following equations. 

2.2.1. Bayes' theorem 

In the eighteenth century, Thomas Bayes laid the foundation of 
a new approach to inference with what is now known as Bayes' 
theorem (Bayes & Price 1763). For the purpose of parameter and 
uncertainty estimation, the hypothesis *7f refers to the values of 
model parameters 0, and Bayes' theorem can be expressed as 



ors p(0\ M, *K). This can be seen by inserting p(0|Al, *K) = const 
into Eq. 9, which leads to 



p(6>|D,M,<70 



p(£>|fl,M,7Q-p(6>|M,'70 
p(D|M,70 



(9) 



where p(-) is a probability (density). Furthermore, D = {(?,,}>,)} 
is the set of pairs of observational times 3 and corresponding data 
values, and M denotes the particular model assumed. As men- 
tioned above, all probabilities are also conditional on the knowl- 
edge "7C, including statements on the types of parameters and the 
parameter space (which we assume to be a subset of R k with 
k € N) as well as the noise model. 

Using Bayes' theorem, the aim is to determine the posterior 
T>(0) = p(0\D, M, 70, i- e. the probability distribution of the pa- 
rameters 6 in light of the data D, given the model M and prior 
knowledge %. The other terms, located on the right-hand side of 
the theorem, are explained below. 

- The term prior refers to the probability distribution 
p(0|Al,'7C) of the parameters 6 given only the model and 
prior knowledge *7C; it characterises the knowledge about the 
parameters present before considering the data. For objec- 
tive choices of priors, based on classes of parameters, see 
Section A. 

- The likelihood p(D\0, M, "7C) is the probability distribution of 
the data values D, given the times of observation, the model 
and the parameters. It is introduced in the context of frequen- 
tist inference in Section 2.1. 

- The evidence is the probability distribution of the data val- 
ues D, given the times of observation and the model but ne- 
glecting the parameter values, 



p(D\M,<K) = j p(D, 0^,^)60 

= j p(6»|M,'7C)p(D|6',M, , 7C)d6l. 



(10) 
(11) 



It equals the integral of the product of prior and likelihood 
over the parameter space and plays the role of a normalis- 
ing constant, which is hard to calculate in practice, however. 

It may be instructive here to note that the frequentist ap- 
proach of maximising the likelihood p(D\0, A1,7C) is equiva- 
lent to maximising the posterior when assuming uniform pri- 



3 In general, f, is the value of an independent variable, which may 
e. g. be temporal or spatial. We assumed the measurement durations to 
be short in comparison with the characteristic time of orbital motion, 
given by the orbital period, and thus the observations to take place at 
points in time f, which are known exactly. 



n0) = p(6>|D,M,70 oc piD^M,^). 



(12) 



However, this maximum-likelihood approach ignores the fact 
that uniform priors are not always the most objective choice (see 
Section A) and the posterior cannot be fully characterised just 
by the position of its maximum. Still, the latter can be used as a 
posterior summary in the Bayesian framework (Section 2.2.2). 

2.2.2. Posterior inference 

Sampling from the posterior. To estimate the normalised pos- 
terior P(0), i. e. the probability distribution of the parame- 
ters in light of the data, N samples of the parameter vector 
|#0) ; j = 1, . . . , ArJ are collected using the Markov chain Monte 
Carlo method (MCMC; e. g. Gilks et al. 1996) in the variant de- 
scribed by the Metropolis-Hastings algorithm (MH; Metropolis 
et al. 1953; Hastings 1970), which performs a random walk 
through parameter space. The distribution of these samples - ex- 
cluding the first M < N burn-in samples, which are still strongly 
correlated with the starting state (O) - converges to the poste- 
rior !P(-) in the limit of many samples if the chain obeys certain 
regularity conditions (e. g. Roberts 1996). Methods for setting 
the starting state (O) and detecting convergence are described in 
Section 3.4. 

Starting from the current chain link e 0, the following 
steps lead to the next link, according to the MH algorithm and 
the hit-and-run sampler (step 1; Boneh & Golan 1979; Smith 
1980): 

1 . Set up a candidate C: 

(a) sample a direction, viz. a random unit vector d eR k from 
an isotropic density over the A:-dimensional unit sphere; 

(b) sample a (signed) distance r from a uniform distribution 
over the interval \r' e R : (J) + r'd e 0); 

(c) set candidate C = (J> + rd; 

2. calculate the acceptance probability 



a(6» 0) ,C) = min|l 



no 

<P{0ti)) 



(13) 



3. draw a random number /? from a uniform distribution over 
the interval [0, 1]; 

4. if fi < a, accept the candidate, i. e. set the next link 0fJ +r> = 
C, otherwise, 6» ( - /+1) = (j \ 

The hit-and-run sampler, compared to alternatives like the Gibbs 
sampler (Geman & Geman 1984), favours exploring of the 
whole parameter space without becoming "trapped" in the 
vicinity of a local posterior maximum (Gilks et al. 1996). 

Because only the ratio of two posterior values is used 
(step 2), the normalising evidence p(D\M, "7C) - a constant that 
is difficult to determine, as mentioned above - is irrelevant in the 
MH algorithm. 

Marginalisation and density estimation. Obviously, the poste- 
rior mode alone reveals only one particular aspect of the poste- 
rior. However, as a density over k > 2 dimensions, the posterior 
cannot be displayed unambiguously in a figure. 

To obtain a plottable summary of the posterior, a set of 
marginal posteriors ?*,•(•), i. e. probability densities over each of 
the parameters and joint marginal posteriors Pij(-, •) over two 
parameters, can be estimated. Theoretically, these densities are 
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derived from the posterior density by marginalisation, viz. inte- 
gration over all other parameters, 

Vm = m\D,M,K) = J P(O)d0 v (14) 

r,,W,Ji,) = tffi h 6 j \D,M,'K) = J nO)d0 VJ , (15) 

where d0 v = Uk*i d0k and dO\ Uj = Uktij d0 k . Practically, 
marginal posteriors are estimated by only considering compo- 
nent i of the collected samples and performing a density 
estimation based on them. Joint marginal posteriors are derived 
analogously, based on components i and j. 

Several density estimators exist for deriving a density from 
a set of samples. One of them - the oldest and probably most 
popular type, known as the histogram - has several drawbacks: 
its shape depends on the choice of origin and bin width, and 
when used with two-dimensional data, a contour diagram cannot 
easily be derived from it. Generalising the histogram to kernel 
density estimation over one or two dimensions, the samples can 
be represented more accurately and unequivocally (Silverman 
1986). 

Below, we refer only to the simpler one-dimensional case. 
There, the kernel estimator can be written as 



T{x) = 



1 



N(T keI 



0"ker 



(16) 



where x is a scalar variable, K(-) the kernel, cr ker the win- 
dow width and are the underlying samples. As detailed by 
Silverman (1986), the efficiency of various kernels in terms of 
the achievable mean integrated square error is very similar, and 
therefore the choice of kernel can be based on other require- 
ments. Since no differentiability is required for the estimated 
densities and computational effort plays an important practical 
role, a triangular kernel, 



K tri (x) = max(l — |jc|, 0) , 



(17) 



was selected for estimating the marginal posteriors. The window 
width is chosen following the recommendations of Silverman 
(1986), 



(r ker = 2.189 • min (cr samp) ^) (N - M)K 



(18) 



where o- samp , r iq ,N and M are the sample standard deviation, the 
interquartile range of the samples, number of samples and burn- 
in length, respectively. 

Periodogram mode. By default, the window width for marginal 
posteriors is based on the MCMC sample standard devia- 
tion cr samp (Eq. 18). If there are multiple maxima, however, 
this can lead to artificially broad peaks, which can be particu- 
larly problematic for the orbital frequency / (see Section 2.3), 
which plays an important role in distinguishing different solu- 
tions in orbit-related parameter estimation. Therefore, BASE in- 
cludes a periodogram mode, in which the window width of the 
marginal posterior of / (a Bayesian analogon to the frequentist 
periodogram) is reduced according the following procedure. 

1. Initially assume a default window width as given by Eq. 18; 

2. estimate the marginal posterior of / and find its local max- 
imum /max nearest to posterior mode /, as well as the local 
minimum f m [ n nearest to / max ; 



3. calculate the marginal -posterior standard deviation o J over 
the half-peak between / max and / m ; n ; 

4. re-calculate the window width using Eq. 18, with y/lcr' re- 
placing o- samp ; 

5. repeat step 2, but only consider local minima with ordi- 
nates p(f min \D, M,<K) < 0.5 • p(f m - dx \D, M,K) in order not 
to be misled by weak marginal-posterior fluctuations; 

6. repeat steps 3 and 4. 

Parameter estimation. To obtain a single most probable esti- 
mate of the parameters, the posterior density P(-) can be sum- 
marised by the posterior mode e ©, i. e. the point where the 
posterior assumes its maximum value, 



= arg max e P(0). 



(19) 



This point, also known as the maximum a-posteriori (MAP) pa- 
rameter estimate, can be approximated by the MCMC sample 
with highest posterior density, based on the values of P(flW) al- 
ready calculated during sampling. This approximation neglects 
the finite spacing between samples. 

Alternatively, the following scalar summaries can be inferred 
from the samples or, for the marginal mode, from the marginal 
posteriors r Pi{0): 



mean or expectation 6, 

Xco 
9Pi(0)d0, 
CO 

median 0, 



f 

•J — a 



Pi(0)d0 = 0.5, 



marginal mode 0, 
0= argmax e !P,(0). 



(20) 



(21) 



(22) 



Uncertainty estimation. For uncertainty estimation, highest 
posterior-density intervals (HPDIs) can be derived from the pos- 
terior samples. For any given C e R with < C < 1, a 
HPDI /hpd = [a, b] is defined as the smallest interval over which 
the posterior contains a probability C, 



r 

J a 



Pi(0)d0 = C, s.t. b-a = rmn. 



(23) 



BASE automatically calculates HPDIs of probability contents 
50%, 68.27%, 95%, 95.45%, 99%, and 99.73%; others may be 
added on user request. 

In contrast to frequentist confidence intervals, HPDIs are 
generally not symmetric, meaning that their midpoint does not 
correspond to the best estimate. This is because the marginal 
posteriors may be asymmetric, including any amount of skew. 

It should also be noted that HPDIs are not useful with multi- 
modal posteriors because several modes cannot be meaningfully 
summarised by one interval per dimension, nor by a single best 
estimate. 

To quantify linear dependencies between parameters, the a- 
posteriori Pearson correlation coefficient, 



r 8^,02 = 



COV(0i,0 2 ) 

Vvar(0!)var(0 2 ) 



r e 2 A> 



(24) 
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can be inferred from the samples. There may also be nonlinear 
correlations between parameters that are not described by the 
correlation coefficients. One should also be aware that for strong 
linear or non-linear relationships between parameters, uncertain- 
ties of single parameters as characterised by HPDIs may not be 
meaningful. 

We stress that the (joint) marginal posteriors can - and 
should - always be referred to, especially when best estimates 
and/or HPDIs do not adequately characterise the posterior. The 
availability of these more informative densities is one of the ad- 
vantages of a Bayesian approach with posterior sampling. 

2.3. Observable models 

Independent of the chosen approach to inference - frequentist 
or Bayesian -, theoretical values of the observables need to be 
calculated and compared to the data by means of the likelihood. 
To this end, an observable model is set up for each relevant type 
of data, i. e. a function f(0; f) of the model parameters 6 and 
time t. An overview of all model parameters used in this work is 
given in Table 1, while Table 2 lists quantities that can be derived 
from them. 

In this section, we only sketch the derivation of the observ- 
able models, beginning with a single-planet system. For an in- 
depth treatment of celestial mechanics, the interested reader is 
referred e. g. to Moulton (1984). 



BASE performs a one-time pre-calculation of E over an 
(e, M)-grid, which, because of the monotonicity of E as an (im- 
plicit) function of e and M, allows one to reduce the effort of nu- 
merically solving Eq. 26 by providing lower and upper bounds 
on E. 

By reference to Eqs. 25 and 26, it is readily shown that the 
stellar coordinates are periodic functions ofx with period 1. We 
therefore call x a cyclic parameter and treat it as lying in the 
range [0, 1). 

2.3.2. Transformation into the reference system 

To derive the stellar barycentric position as seen from the per- 
spective of an observer, we transform S i into a new coordinate 
system S 4 by three successive rotations. These are described by 
three Euler angles, termed in our case argument of the periap- 
sis to*, inclination ; and position angle of the ascending node £2, 
and are carried out as follows (Fig. 1): 

1. Rotate S 1 about its zi-axis by (-to*) such that the ascending 
node 4 of the stellar orbit lies on the positive X2-axis. 

2. Rotate S 2 about its X2-axis by (+;') such that the new Z3-axis 
passes through the observer. 

3. Rotate S 3 about its z 3 -axis by such that the new X/paxis 
is parallel to the meridian of the CM and points in a northern 
direction. 



2.3.1. Stellar motion in the orbital plane 

Newton's Law of Gravity governs the motion of a non- 
relativistic two-body system of star and planet, whose centre of 
mass (CM) rests in some inertial reference frame. A solution 
to it is given by both the star and the planet moving in ellipti- 
cal Keplerian orbits with a fixed common orbital plane and each 
with one focus coinciding with the CM. 

To describe the stellar position, whose variation is observ- 
able with astrometry and Doppler spectroscopy, we set up a co- 
ordinate system S 1 whose origin is identical to the CM, z-axis 
perpendicular to the orbital plane and the vector from the CM to 
the periapsis orientated in positive x-direction. In Si, the stellar 
barycentric position is given by 



r\ = a* 



cos E — e 
Vl - e 2 sin£ 




= ri(E;a*,e), 



(25) 



where a* is the semi-major axis, e the eccentricity and E the 
eccentric anomaly. 

The time-dependent eccentric anomaly is determined implic- 
itly by Kepler's equation, 



E - e sin£ = 2n(x + f(t - h)) = M(t), 



(26) 



where / = P 1 is the orbital frequency, P the orbital period, 
fi the time of first measurement and M(-) the mean anomaly, 
which varies uniformly over the course of an orbit. Furthermore, 
following Gregory (2005a), we use 



M(h) 

* s ^T =/(fl - r) ' 



(27) 



with T standing for the last time the periapsis was passed prior 
to t\ (time of periapsis). Kepler's equation is transcendental and 
needs to be solved numerically to obtain E for every relevant 
combination of e and M. 



Thus, the stellar barycentric position has new coordinates 

r 4 = RzA-zfl, 



with matrix 



R zxz = 



(A F J\ 
B G K 

,C H L, 



defining the rotations; its components are 

A = cos O cos to* - sin Q cos ; sin to* 

B = sin O cos to* + cos Q cos /sin to* 

F = - cos Q. sin to* - sin Q cos i cos to* 

G = - sin O sin to* + cos Q cos i cos to* 

J = - sin £2 sin i 

K = cos O sin ; 

C = - sin ;' sin to* 

H = - sin ;' cos to* 

L = cos i. 



(28) 



(29) 



(30) 
(31) 
(32) 
(33) 
(34) 
(35) 
(36) 
(37) 
(38) 



A, B, F, and G are known as the Thiele-Innes constants, first in- 
troduced by Thiele (1883). 

By taking the time derivative of Eq. 28, we obtain the stellar 
velocity in S4, 



v 4 



R -p 

zxz dE 



1 - e cos E 



- sinE 
Vl - e 2 cos£ 




(39) 



4 The ascending node is the point of intersection of the orbit and the 
sky plane where the moving object passes away from the observer. In 
step 2, a positive rotation angle is used to ensure that the node is indeed 
ascending (Fig. 1 b). 
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Figure 1. Definition of the angles oj*, i, fi. a) From S i to 5 2 , the star and 
its sense of rotation about the CM are indicated; the dotted line marks 
the major axis of the orbital ellipse, b) From S 2 to S 3 , the observer and 
line of sight are indicated, c) From S3 to 54, the positive x^-axis points 
northward along the meridian of the CM. 



2.3.3. Relation to the planetary orbit 

By reference to the above results, the observables of AM and RV 
are easily derived (Section 2.3.4). Based on the following simple 
relation, they can be parameterised by quantities pertaining to 
the planetary instead of the stellar orbit. 



According to the definition of the CM, the line connecting 
star and planet contains the CM and the ratio of their respective 
distances from the CM equals the inverse mass ratio, 



^CP, 



(40) 



where C, S and P stand for CM, star and planet, respectively. 
This implies a simple relation between the orbits of the star and 
the planet as follows. The two bodies orbit the CM with a com- 
mon orbital frequency / and time of periapsis T. With regard 
to the corresponding periapsis, they always have the same ec- 
centric anomaly E. Their orbital shapes, viz. eccentricities e, are 
identical as well. 

Furthermore, the two bodies share the same sense of orbital 
revolution, hence those nodes of both orbits which lie on the pos- 
itive X2-axis are ascending. Consequently, the only Euler angle 
that differs between stellar and planetary orbit is the argument 
of periapsis, which differs by n because star and planet are in 
opposite directions from the CM. 

2.3.4. Observables 

To express the stellar barycentric position as a two-dimensional 
angular position, we performed a final transformation of S 4 into 
a spherical coordinate system S 5 with radial, elevation and az- 
imuthal coordinates (r, 6, a). Its origin is identical with the ob- 
server, its reference plane coincides with the (j4,Z4)-plane and 
its fixed direction is -z 4 . In S 5, the radial coordinate of the CM 
equals a distance d = lAUra" 1 , with m being the parallax. 
Stellar coordinates r 4 therefore correspond to a two-dimensional 
angular position 



r 5 = 



1 x 4 

d [y* 



(41) 



in S 5, where 6 and a are called the declination and right ascen- 
sion, respectively. Using Eqs. 25, 28 and 29, the model function 
for the angular position of the star with respect to the CM be- 
comes 



r(0; t) = r 5 = a! I (cos E - e) 



vr 



'• sin E 



(42) 



with a' = G7a*(lAU) _1 . In practice, complications arise because 
the stellar position is often measured relative to a physically 
unattached reference star, whose distance and motion differ, and 
not relative to the unobservable CM. By contrast, for a visual 
binary, the companion can be used as a reference, yielding the 
simple model described below. 

Other astrometric effects may be caused by the acceler- 
ated motion of Earth-bound observers around the solar system 
barycentre (SSB). This is discussed for the case of the binary 
Mizar A in Section 2.3.5. 

In contrast to astrometry, RV data are usually automatically 
transformed into an inertial frame resting with respect to the SSB 
(e. g. Lindegren & Dravins 2003), which allows the Earth's mo- 
tion to be neglected in this model and treats the observer's rest 
frame as being inertial. The model function for the stellar radial 
velocity measured by an observer is thus given by (vs) r , with 

, ^ sin£ sin u> - Vl - e 2 cos £ cos co ,,„ N 

(v 5 ), -V= -(v 4 ) z = K , (43) 

1 - e cos E 

where V is the RV offset, consisting of the radial velocity of the 
CM plus an offset due to the specific calibration of the instru- 
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ment - which therefore differ, in general, between RV data sets 
- and K is the RV semi-amplitude, which can be expressed as 



3/ - - 

K = 2nfa+ sin; = ^2nGf m p (m p + m*) 3 sin;'. 
The last equality holds because of Kepler's third law, 



(flp + fl*) 3 / 2 = ^2 (m P + m *) 



(44) 



(45) 



and the definition of the CM (Eq. 40). Owing to Eq. 44, only 
one of a±,K needs to be employed in the AM and RV models; 
for BASE, K was adopted. 

As an aside, in the literature (e. g. Gregory 2005a) often a 
different version of Eq. 43 is employed that involves v instead of 
E and the alternative definition K. d n = —f= (see Table 2). 



Binary system. If the primary and secondary binary compo- 
nents assume the roles of star and planet, respectively, the above 
reasoning also yields the observables of a binary system. 

For visual binaries, AM measurements often refer to the po- 
sition of the secondary with respect to the primary. From the 
definition of the CM, it follows that the orbit of the secondary 
with respect to the primary is identical with its barycentric or- 
bit but scaled by a factor (mi + m^mT^ , or with the semi-major 
axis equaling the sum of the two components' barycentric semi- 
major axes, 



Orel - «1 + a.2- 



(46) 



Thus, the AM model of Eq. 42 can be used for a binary with a K \ 
replacing a*, a>2 + n replacing and 



VI - 



1AU ' 



(47) 



Equation 43 yields the RV of component ;' if K is replaced by 
(-l) i+l Ki and a> by co 2 . If AM and RV data are combined, BASE 
uses Ki t 2 instead of #1,2 and calculates a re i using the equivalent 
of Eq. 44 in combination with Eq. 46. 



where the first term corresponds to the annual parallax and ji is 
the magnitude of the CM's proper motion. For the second term, 

we have jiAt 



-Mis- 



using sin(x + £ ) « sinx + £cosx, for £ <K 1, along with 
Eqs. 30 - 35 and 48 as well as the final results for Q., ;', to and m 
(using the values 8 from Table B.l) and the proper motion from 
Table 3 yields the following maximum changes of the Thiele- 
Innes constants: 



|AA| g 8.30- lO" 6 
|AB| g 8.00 • 10~ 6 
|AF| £ 7.99 • 10~ 6 
|AG| % 4.34 • 10~ 6 . 

Hence, with Ar as y/\A~dft + |A(acos(5)| 2 , 

|Atf| < < el (2|AA| + |AF|) 
\A(aco&6)\ < a' tA (2|AB| + |AG|) 



(49) 
(50) 
(51) 
(52) 



(53) 
(54) 



and the final posterior median a~ K \ (Table B.2), we obtain the 
maximum change in relative angular position of the two compo- 
nents, 



Ar g 0.311//as. 



(55) 



Comparison with Table 4 reveals that this is more than two or- 
ders of magnitude smaller than the median AM measurement 
uncertainty, proving that the motions of the Earth and the CM of 
Mizar A can indeed be neglected. 



2.3.5. Effects of the motion of observer and CM 

If we consider the observer, whose position relative to the CM 
defines the orientation of S2...5, to be located on Earth and there- 
fore to be subject to Earth's accelerated motion around the SSB, 
the angles Q, ;', oj as defined above are not constant but rather 
functions of time; an additional source of their variation is the 
(assumedly linear) proper motion of the CM. Consequently, 
these angles are not appropriate constant model parameters even 
within a timespan of a few months. Furthermore, the systems 
S2...5 defined above are not strictly inertial, which renders the 
simple coordinate transformations invalid. However, we argue 
below that the variation in these angles is so weak that these ef- 
fects are indeed negligible in the context of this work. 

Because AM determines instantaneous positions, it is di- 
rectly influenced by changes in the positions of the observer and 
the CM. In the following, we assess for Mizar A the greatest pos- 
sible magnitude of a change in the AM position Ar = \Ar\ due 
to angular changes AQ, A;', Aw > 0, which are in turn caused 
by the varying relative position of observer and CM. Finally, we 
compare Ar to the AM measurement uncertainties. 

An upper limit on each of the angular changes AO, A;', Aoj 
can be derived from the proper motion of the CM and the annual 
parallax from the Earth's motion, as 

\AD.\,\Ai\,\Aco\<2m + fiAt, (48) 
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Table 1. Model parameters used by BASE. 









Widest 








Data Types 6 


Symbol 


Designation 


Unit 


Prior Support 


Cyclic" 


Prior Type 


AM 


RV 


AM+RV 


e 


eccentricity 


1 
1 


rn 1 ~i 
LU, i) 




uniform 


X 


X 


X 


f 

X 

OJ, U>2 
I 


orbital frequency 

mean anomaly at t\ over 2zr 

argument of periapsis c 

lllCllllallUIl 


d- 1 

i 
i 

rad 
rad 


(0, 1] 
rn 1 ~i 

\s>, i) 

[0, In) 
[V, n) 


X 
X 


Jeffreys 
uniform 
uniform 

1 1 n i f /"\i"m 
LII111UI111 


X X X X 


X 
X 
X 


X X X X 


n 


position angle of the ascending node e 


rad 


[0, In) 


X 


uniform 


X 




X 




semi-major axis of orbit of secondary 
around primary over distance 


mas 


[io 3 , io- 5 ]/ 




Jeffreys 


X 






■UJ 

V 


parallax 
RV offset 


arcsec 
ms~' 


(0, 0.77] * 




Jeffreys 
uniform 




X 


X 
X 


K, K t 

T+ 


RV semi-amplitude d 

standard deviation of additional AM noise 


ms~' 
mas 


> 

> 




mod. Jeffreys 
mod. Jeffreys 


X 


X 


X 
X 


0-+ 


standard deviation of additional RV noise 


ms -1 


>0 




mod. Jeffreys 




X 


X 



Notes. (<l) For cyclic parameters 6, the indicated lower and upper bounds are treated as equivalent by BASE (see Section 2.3). {b) The types of 
data for which each parameter is relevant. Abbreviations: AM (astrometry), RV (radial velocities). (c) Normal mode employs w, while binary mode 
employs oj 2 . {d) Normal mode employs K, while binary mode employs Ki and K 2 . {e) The widest prior range of Q, reduces to [0,n) if no RV data 
are provided, in which case it cannot be determined whether a given node is ascending or descending; then, Q. is defined to be the position angle 
of the first node. (/) Lower bound corresponds to AM measurement uncertainty of 1/jas; upper bound according to wide-binary observations by 
Tolbert (1964). <s) Interval includes trigonometric parallax of the nearest star, Proxima Centauri (Perryman et al. 1997). 

Table 2. Quantities derived from model parameters. 



Mode" Data Types " 



Definition 


Designation 


Unit 


Equations 


N 


B 


AM RV 


AM+RV 




period 


d 




X 


X 


X X 


X 




time of periapsis 


d 


27 


X 


X 


X X 


X 


m 


distance 


pc 






X 




X 


Q = 'Hi - !£>. 


binary mass ratio 


1 


40,44 




X 


X 


X 


K - K 
A alt = Y^^f 


alternative RV semi-amplitude c 


ms~' 




X 


X 


X 


X 


m J~ G /(2^sin0 3 


component mass 


Mq 


44, 45 




X 




X 


K i 

a J - Infsini 


semi-major axis of component's 
orbit around CM 


AU 


44 




X 




X 


a «i-ai +a 2 - 2nfsini 


semi-major axis of secondary's 
orbit around primary 


AU 


44 




X 




X 


a * sin! = lk 


semi-major axis of stellar orbit 
times sine of inclination 


AU 


44 


X 




X 




m pMn = K{2nGfyhm v , mm + m*)i 


minimum planetary mass d 


M, 


44 


X 




X 




«p,max = a * sin ' 

r "*p,min 


maximum semi-major axis of 
planetary orbit 


AU 


40 


X 




X 





Notes. (a) The modes in which each derived quantity appears. Abbreviations: N (normal), B (binary). (i) The types of data for which each derived 
quantity is relevant. Abbreviations: AM (astrometry), RV (radial velocities). <c) Refers to the star or any of the binary components, respectively. 
{d) The implicit function of minimum planetary mass m Pimin is solved numerically. The planet's minimum mass equals its real mass if the orbit is 
edge-on, viz. sin i = 1 . 



3. BASE - Bayesian astrometric and spectroscopic 
exoplanet detection and characterisation tool 

We have developed BASE, a computer program for the com- 
bined Bayesian analysis of AM and RV data according to 
Section 2, for the following main reasons. 

- A statistically well-founded, reliable tool was needed that 
was able to perform a complete Bayesian parameter and un- 
certainty estimation, along with model selection (only for 
planetary systems, not detailed in this article). 

- We aimed to combine astrometry and Doppler-spectroscopy 
analyses. 



- A possibility to include knowledge from earlier analyses was 
needed. 

- Finding all relevant solutions across a multidimensional, 
high-volume parameter space was required. A more de- 
tailed knowledge of the parameters than a best estimate and a 
confidence interval can provide is especially important when 
the data do not constrain the parameters well, e. g. when only 
few data have been recorded or the signal-to-noise ratio is 
low (as can be the case for lightweight planets or young host 
stars). 

BASE is a highly configurable command-line tool developed 
in Fortran 2008 and compiled with GFortran (Free Software 
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Foundation 201 la). Options can be used to control the program's 
behaviour and supply information such as the stellar mass or 
prior knowledge (see Section 3.1). Any option can be supplied 
in a configuration file and/or on the command line. 

3.1. Prior knowledge 

Any model parameter can be assigned a prior probability density 
in one of three forms: 

- a fixed value; 

- a user-defined shape, i. e. a set {(#,-, p(0,j>M, TO))}; 

- a range (either in combination with a specific shape, which 
is then clipped to the given range, or else using one of the 
standard prior shapes detailed in Section A). 

Any parameter for which no such option is present is assigned 
a default prior shape and range to pose the weakest possi- 
ble restraints justified by the data, the model, and mathemati- 
cal/physical considerations. 

3.2. Physical systems and modes of operation 

As mentioned above, BASE is capable of analysing data for two 
similar types of systems, whose physics have been described in 
Section 2.3. These are systems with 

1 . one observable component that may be accompanied by one 
or more unobserved bodies (generally referred to as plane- 
tary systems here for simplicity); in this normal mode, the 
number of companions can be set to a value ranging from 
zero to nine, or a list of such values, in which case several 
runs are conducted and the outcome is compared in terms of 
model selection; 

2. two observable, gravitationally bound components (referred 
to as binary stars); this mode of operation is called binary 
mode. 

3.3. Types of data 

Observational data of the following types can be treated by 
BASE: 

- AM data, whose observable, in the case of binary targets, is 
the relative angular position of the two binary components. 
Each data record consists of a date, the angular position 
(a cos 6, 6) or (p, 8) in a cartesian or polar coordinate system, 
and its standard uncertainty ellipse, given by (a, b, (p); 

- RV data, where each data record consists of a date and the 
observed stellar radial velocity as well as its uncertainty. 

3.4. Computational techniques 

In the following, we describe some computational techniques 
implemented in BASE that are relevant to the present work. 

Improved exploration of parameter space. To enhance the 
mixing, i.e. rapidness of exploration of the parameter space, of 
Markov chains produced by the Metropolis-Hastings (MH) algo- 
rithm (Section 2.2.2) and decrease their attraction by local poste- 
rior modes, the parallel tempering (PT) algorithm (e. g. Gregory 
2005b) is employed one level above MH. Its function is to create 
in parallel n chains of length N, \ ff~ ( J } : j = I, . . . , N; k = I, . . . , n\ 



each sampled by an independent MH procedure, where the cold 
chain k = 1 uses an unmodified likelihood £(■), while the oth- 
ers, the heated chains, use as replacement X(-) r<t) with the pos- 
itive tempering parameter y^) < E After n swap samples of each 
chain, two chains k > 1 and k + 1 < n are randomly selected and 
their last links, denoted here by %) and 6(k+i), are swapped with 
probability 

W^+D = nnn(l,(^^) (^-^ ), (56) 

which ensures that the distributions of both chains remain un- 
changed. 

This procedure allows states from the "hotter" chains, which 
explore parameter space more freely, to "seep through" to the 
cold chain without compromising its distribution. Conclusions 
are only drawn from the samples of the cold chain. 

In contrast to MCMC sampling, which is sequential in na- 
ture, the structure of PT allows one to exploit the multipro- 
cessing facilities provided by many modern computing architec- 
tures. For this purpose, BASE uses the OpenMP API (OpenMP 
Architecture Review Board 2008) as implemented for GFortran 
by the GOMP project (Free Software Foundation 201 lb). 

Assessing convergence. As a matter of principle, it cannot be 
proven that a given Markov chain has converged to the posterior. 
However, convergence may be meaningfully defined as the de- 
gree to which the chain does not depend on its initial state # (0) 
any more. This can be determined on the basis of a set of inde- 
pendent chains - in our case, the cold chains of m independent 
PT procedures - started at different points in parameter space. 
These starting states should be defined such that for each of their 
components, they are overdispersed with respect to the corre- 
sponding marginal posteriors. 

Since the marginal posteriors are difficult to obtain before the 
actual sampling, BASE determines the starting states by repeat- 
edly drawing, for each parameter, a set of m samples from the 
prior using rejection sampling; the repetition is stopped as soon 
as the sample variance exceeds the corresponding prior variance, 
which yields a set of starting states overdispersed with respect to 
the prior. Assuming that the prior variance exceeds the marginal- 
posterior variance, the overdispersion requirement is met. 

Such a test, using the potential scale reduction (PSR) or 
Gelman-Rubin statistic, was proposed by Gelman & Rubin 
(1992) and was later refined and corrected by Brooks & Gelman 
(1998). It is repeatedly carried out during sampling and, in the 
case of a positive result, sampling is stopped before the user- 
defined maximum runtime and/or number of samples have been 
reached. It may also sometimes be useful to abort sampling man- 
ually, which can be done at any time. 

The statistic is calculated with respect to each pa- 
rameter 6 separately, using the post-burn-in samples 5 
: j = M+ l,...,N;l = l,...,m} of 9 provided by the 
m independent chains. It compares the actual variances of 
9 within the chains built up thus far to an estimate of the 
marginal-posterior (i. e. target) variance of 8, thus estimating 
how closely the chains have approached convergence. 

The PSR £ 1/2 is defined as 



5 Because the burn-in length M cannot be determined in advance, 
BASE considers a fixed but configurable fraction of samples to belong 
to the burn-in phase at any time. 
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where V is an estimate of the marginal-posterior variance, d the 
estimated number of degrees of freedom underlying the calcula- 
tion of V, and W the mean within-sequence variance. The quan- 
tities are defined as 



v _ 1 ,„ m + 1 „ 
V = W+ B 



mv 

m N 



m(v - 1) j-i V ' 



M+l 

2 



m _ 1 Z_J V / 



(58) 



(59) 



(60) 



where B is the between-sequence variance and v = N - M; a 
horizontal bar denotes the mean taken over the set of samples 
obtained by varying the omitted indexes. 

Owing to the initial overdispersion of starting states, V over- 
estimates the marginal-posterior variance in the beginning and 
subsequently decreases. Furthermore, while the chains are still 
exploring new areas of parameter space, W underestimates the 
marginal-posterior variance and increases. Thus, as convergence 
to the posterior is accomplished, K 1/2 \ 1 . Therefore, we can as- 
sume convergence as soon as the PSR has fallen below a thresh- 
old ft 1 ' 2 > 1. If BASE is run with fl, = 1, sampling is continued 
up to the user-defined length or duration, respectively. 

For cyclic parameters (Section 2.3), the default lower and 
upper prior bounds are equivalent, which needs to be taken into 
account when calculating the PSR. Thus, BASE uses the modi- 
fied definition of the PSR by Ford (2006) for these parameters. 

4. Target and data 

Mizar A (f 1 Ursae Majoris, HD 116656, HR 5054), the first 
spectroscopic binary discovered, is of double-lined type (SB II; 
Pickering 1890). Its basic physical properties are summarised 
in Table 3. Together with the spectroscopic binary Mizar B, it 
forms the Mizar quadruple system, seen from Earth as a visual 
binary with the components separated by about 1474. Mizar is 
the first double star discovered by a telescope and also the first 
one to be imaged photographically (Bond 1857). 

At an apparent angular separation of about 1L8, or 74 + 
39kAU spatial distance, Mizar is accompanied by Alcor, which 
has recently turned out to be a spectroscopic binary itself 
(Mamajek et al. 2010). Mizar and Alcor, also known as the 
"Horse and Rider", form an easy naked-eye double star, while it 



Table 3. Basic physical properties of Mizar A. 



Property 


Value 


Reference 


Type 


SB II 


1 


Spectral types 


2x A2V 


2 


M v 


2.27 ± 0.07 mag 


2 


M hA 


0.91 ±0.07 mag 


3 


L 


33.3 ±2.1 Lq 


3 


R 


2.4 ±0.1 Rq 


3 


■m" 


39.4 ±0.3 mas 


3 




119.01 ±1.49 masyr-' 


4 


Us 


-25.97 ±1.65 masyr 1 


4 



Notes. <a) Different values of the parallax have been estimated in this 
work (Table B.l). 

References. (1) Pickering (1890); (2) Hoffleit & laschek (1982); 
(3) Hummel et al. (1998); (4) van Leeuwen (2007) 



is still a matter of debate whether or not they constitute a physi- 
cally bound sextuplet. 

Published data used in this article are displayed in Figs. 7 - 
8 along with the model functions determined by BASE and their 
properties are summarised in Table 4. Using a Coude spectro- 
graph at the 1.93-m telescope at Observatoire Haute Provence, 
Prevot (1961) obtained 17 optical photographic spectra of the 
light of Mizar A combined with that of electric arcs or sparks 
between iron electrodes. For each of the 13 individual stellar 
lines identified in the spectra, one intermediate radial-velocity 
value per binary component was obtained by comparison with a 
set of reference lines of iron. Finally, a set of 17 pairs of RVs for 
the two components was calculated as arithmetic means of the 
corresponding intermediate values. The RV measurement uncer- 
tainty is not given by Prevot (1961) and was therefore estimated 
in the course of the present work, as described in Section 5. 

High-precision AM data of Mizar A were first obtained by 
Hummel et al. (1995) using the Mark III optical interferometer 
on Mount Wilson, California (Shao et al. 1988), with baseline 
lengths between 3 and 31 m. It measured the squared visibili- 
ties and their uncertainties at positions sampled over the aperture 
plane due to Earth's rotation. The visibilities can be modelled as 
a function of the diameters, magnitude differences, and relative 
angular positions (p, 9) of the binary components, Armstrong 
et al. (1992). These authors also describe a procedure to derive 
one angular position for each night of observation from a corre- 
sponding set of visibilities, which was adopted by Hummel et al. 
(1995) to obtain initial estimates of the orbital parameters. These 
positional data are also relevant for the present work. Hummel 
et al. (1995) also performed a direct fit to the squared visibilities 
to derive final estimates of the component diameters and orbital 
parameters. 

Later, a descendant instrument, the Navy Prototype 
Optical Interferometer (NPOI) at Lowell observatory, Arizona 
(Armstrong et al. 1998), was used by Hummel et al. (1998) to 
obtain more accurate results using three siderostats, viz. three 
baselines at a time. This allowed a better calibration using the 
closure phase, 6 which is independent of atmospheric turbulence. 
Similarly as before, Hummel et al. (1998) separately fitted bi- 
nary orbits directly to the visibility data and also to the positional 
angles derived for each night, concluding that the respective re- 
sults agree well with each other and that those parameters in 
common with spectroscopic analyses are compatible with Prevot 
(1961); they also performed a fit to both AM and RV data to ob- 
tain their final parameter estimates. 

While Hummel et al. (1998) did not include the older, less 
accurate Mark III data in their analysis, we present a combined 
treatment of all published data, i. e. the AM positions of Hummel 
et al. (1995) and Hummel et al. (1998) along with the RV data 
of Prevot (1961). 



6 The closure phase <p A is the phase of the product of three visibilities, 
each pertaining to a different baseline. 
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Table 4. Published data for Mizar A used in this work. 



Data type 


Instrument 


Observatory 


Year 


No. records 


Median uncertainty 


Reference 


Radial velocities 


Coude spectrograph 


Haute Provence 


1961 


2x 17 


2.050 kms" 1 " 


1 


Angular positions 


Mark III interferometer 


Mount Wilson 


1995 


28 


0.040 mas/ 0.345 mas* 


2 


Angular positions 


NPOI interferometer 


Lowell 


1998 


25 


0.042 mas/ 0.137 mas* 


3 



Notes. <a) Uncertainties are missing in the original publication, but have been estimated in this work (Section 5.1). (i) Uncertainties refer to the 
semi-minor and -major axes of the standard uncertainty ellipses, respectively (Section 2.1). 

References. (1) Prevot (1961); (2) Hummel et al. (1995); (3) Hummel et al. (1998) 



5. Analysis and results 

To illustrate the features of BASE and demonstrate its validity, 
we used the tool to analyse all published data of Mizar A. This 
section details the steps taken in and the results of our analy- 
sis. Its general goal was, given uninformative prior knowledge, 
to search the parameter space as comprehensively as possible to 
find and characterise the a posteriori most probable solution to- 
gether with its uncertainty and other characteristics. For reasons 
detailed below, once the RV data had been prepared, several runs 
of BASE (Table 5) were manually conducted, each using pri- 
ors derived from the previous pass, with relatively uninformative 
priors in the first step. 

In this analysis, our approach was to regard all nominal mea- 
surement uncertainties as accurate by setting parameters charac- 
terising additional noise in AM (t + ) as well as RV (cr + ) data to 
zero. 

Astrometric data alone allow one to constrain neither the 
RV offset V nor the amplitudes K\ , K-i but only the sum K\ + 
K-i (Section 2.3.4). By contrast, these parameters can be con- 
strained using spectroscopic data alone, or AM and RV data 
both. However, the AM data reduce the relative weight of the 
spectroscopic data and thus make the determination of these pa- 
rameters harder. We found in the course of this work that by an 
iterative approach, starting with a first pass using only RV data, 
this difficulty could be resolved. 

In all passes, BASE was configured to build up eight paral- 
lel chains, including the cold chain, using the parallel-tempering 
technique (detailed in Section 3.4). 10 8 posterior samples were 
collected, the first 10% of which were assumed to be burn-in 
samples. In the final pass, two PT procedures were employed to 
enable a test of convergence to the posterior, which reduced the 
number of samples collected with unchanged memory require- 
ments to 5 • 10 7 . Convergence was not assessed in earlier passes, 
because convergence was difficult to reach as long as several dis- 
tinct solutions existed within the prior support. 

5. 1 . Preparation of RV data 

Assuming appropriate measurement uncertainties is a prereq- 
uisite for the proper relative weighting of the different data 
types when combining them. Because these uncertainties are not 



Table 5. Analysis passes carried out sequentially. 



Pass 


Description 


Data 


A 


First constraints on RV parameters 


RV 


B 


Combining all data 


All 


C 


Selecting frequency / 


All 


D 


Selecting w 2 and Q, 


All 


E 


Refining results 


All 



Table 6. Pass A: initial prior ranges. 



e 


/" 


X 


U>2 


V b 




K 2 C 








(rad) 


(kms" 1 ) 


(kms" 1 ) 


(kms^ 1 ) 





2.7379 • 10" 6 








-65.25 








1 


1 


1 


In 


56.15 


69.50 


68.85 



Notes. (n) The prior bounds of / correspond to a period range between 
Id and lyr. (A) The bounds of V are given by the section of the RV 
ranges measured for primary and secondary, with the latter extended 
by one corresponding measurement uncertainty on both sides. <c) The 
upper bounds of Ki and K 2 are given by half the measured RV span of 
the corresponding component, plus one measurement uncertainty. 



quoted by Prevot (1961) for the spectroscopic data, we estimated 
them according to the following method. 

First, we assumed that each RV datum v, is the sum of the 
model value v(0 tme ; f,) and an error e„ where true are the true 
parameter values and the errors are independent and identically 
Gaussian-distributed with an unknown standard deviation cr. 
Furthermore, we assumed that the model parameters found in 
a particular analysis are identical with 6 tme . It follows that the 
uncertainty cr can be estimated as the sample standard deviation 
of the set of residuals {v,- - v(6 tme ; ?,-)}• 

Thus, based on the sample standard deviation of the best- 
fit residuals of Prevot (1961), viz. 2.13kms _1 , we initially as- 
sumed a conservative value of 2.50kms~ 1 for all data. To quan- 
tify the measurement uncertainties based on our own inference, 
we then conducted a preliminary analysis similar to pass A de- 
scribed in the next subsection, from which we finally inferred 
cr = 2.05kms~ 1 . In addition, we took a more correct alternative 
approach to estimating the RV uncertainties by assuming a rela- 
tively low value of cr = 2.00kms~ 1 and allowing for higher noise 
via <r + , which led to an estimate y/cr 2 + <j\ deviating by less 
than 1.7% from our previous estimate. 



5.2. Pass A: first constraints on RV parameters 

To facilitate the determination of the RV parameters K\ , K2 and 
V, a first pass using only RV data was carried out, as mentioned 
above. It used uninformative priors (Section A and Table 1), with 
bounds listed in Table 6. 

Figure 2 shows the resulting marginal posteriors (see 
Section 2.2.2) of the RV offset V and the amplitudes K\,K2, with 
the abscissae approximately corresponding to the 99% HPDIs. 
These marginal posteriors represent much tighter constraints on 
the parameters than the corresponding priors do. 
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e, 




-14 -12 -10 



Figure 2. Pass A: Marginal posteriors of RV amplitudes K t (top, solid 
line), K 2 (top, dashed line) and offset V (bottom), all plotted over the 
approximate range of the corresponding 99% HPDI. 



^3 




0.01 0.02 



0.03 0.04 



0.05 0.06 



Figure 3. Pass B : Marginal posterior of orbital frequency /, plotted over 
the range of the corresponding 99% HPDI. This is a Bayesian analogon 
to the frequentist periodogram. 

Table 7. Pass B: prior ranges for additional AM parameters. 



i 


a 


m" 


(rad) 


(rad) 


(arcsec) 











n 


In 


0.77 



Notes. (<I) Interval includes trigonometric parallax of nearest star, 
Proxima Centauri (Perryman et al. 1997). 



G 



3 




a>2, Q. [rad] 

Figure 4. Pass C: Marginal posteriors of the argument of periapsis u)% 
(solid line) and position angle of the ascending node Q. (dashed line), 
plotted over the approximate range of the corresponding 99% HPDIs. 
Triangles indicate the positions of small local maxima located approxi- 
mately ±n from the corresponding marginal modes. 



5.3. Pass B: combining all data 

Providing as new priors the marginal posteriors of all RV param- 
eters e, f,x, 0)2, y, Ki and K 2 from pass A, constrained to the cor- 
responding 99% HPDIs /hpd, 99%, and again using uninformative 
priors on the additional parameters (Table 7), the AM data were 
added and BASE was run again. Using the resulting posterior 
samples, BASE was additionally invoked in periodogram mode 
to refine the kernel window width for the marginal posterior of 
/ as described in Section 2.2.2. (This refinement was not used in 
pass A in order not to constrain the frequency too tightly before 
adding the AM data, because the combined data are expected to 
correspond to a marginally differing frequency.) 

The resulting marginal posterior (Fig. 3) exhibits a very 
strong mode around 0.04869d~', whose height is 8.1 times 
that of the next lower peak. Over the range of its mode, the 
marginal posterior contains a total probability of 45.0%. Most 
other parameters in this stage have very broad and/or multimodal 
marginal posteriors, hinting at different solutions still probable 
within the prior support. 



5.4. Pass C: selecting the frequency f 

For the next pass, the prior of / was provided by the marginal 
posterior of pass B (Fig. 3), constrained to the range of the 
marginal mode. For all other parameters, priors were identical 
with the previous marginal posteriors, constrained to 7 H pd, 99%- 

With the frequency thus restrained, the pass produced uni- 
modal marginal posteriors in all parameters except for u> 2 and 
£1 

The marginal posteriors of the argument of periapsis oj 2 and 
the position angle of the ascending node £2 exhibited small local 
maxima located at a distance of -l.02n and +1.037T from their 
marginal modes, respectively (indicated by triangles in Fig. 4). 
This can be explained by the fact that the Thiele-Innes constants 
(Eqs. 30 - 33), which appear in the AM model, contain prod- 
ucts of sin(-) and/or cos(-) functions with lo 2 and Q. as argu- 
ments. These products retain their values when lo 2 and Q. are 
both shifted by n - with opposite signs, since the marginal modes 
of these angles lie in different halves of the interval [0,2^). 
Consequently, the AM model function is invariant with respect 
to such shifts. (Thus, when analysing only AM data, it cannot 
be determined which node is ascending, hence Q is denned only 
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over the interval [0, n) and refers to the first node.) Owing to 
the combination with RV data, which independently constrain 
C02 (Eq. 43), the ambiguity is strongly reduced, as illustrated by 
the very small subpeaks in Fig. 4 - though it is not completely 
resolved. 

As another result from this pass, the high correlation coef- 
ficient of the two angles, r W2> n = -0.62, expresses the strong 
negative linear relationship between them introduced by the pos- 
sibility of a contrarious change. 

5.5. Passes D and E: selecting u> 2 and Q and refining results 

In pass D, the ambiguity in cj 2 and O was resolved by select- 
ing the range around their marginal modes via the new priors. 
For all other parameters, priors were again identical to the pre- 
vious marginal posteriors, constrained to /hpd, 99%- All resulting 
marginal posteriors turned out to be unimodal, corresponding to 
a single solution as opposed to several clearly distinct orbital so- 
lutions. 

As a final step, pass E was conducted to refine the results by 
using all previous marginal posteriors, constrained to / H pd, 99%, 
as new priors, thus confining the parameter space a priori to the 
most probable solution. 

(Joint) marginal posteriors and correlations. The final 
marginal posteriors of all model parameters, plotted over the 
corresponding 95% HPDIs, are shown in Fig. 5, along with the 
medians, 68.27% HPDIs (corresponding in probability content 
to the frequentist 1-cr confidence intervals) and MAP estimates. 
For comparison, literature estimates and confidence intervals are 
also included, where permitted by the abscissa range. 

Linear and nonlinear dependencies between a pair of param- 
eters may be qualitatively judged by means of the joint marginal 
posteriors (Section 2.2.2) shown in Fig. 6. The inclinations of 
their equiprobability contours with respect to the coordinate axes 
are related to the corresponding correlation coefficients. Some 
joint marginal posteriors exhibit clear deviations from bivariate 
normal distributions, illustrating that a Gaussian approximation 
of the likelihood by use of the Fischer matrix (Section 2. 1) would 
be inappropriate. 

Table B.3 lists the final posterior correlation coefficients. 
When these attain high absolute values, model-related equations 
can sometimes serve as an explanation. 

For example, the highly negative correlation between / and 
X is related to Eq. 26 in the following way. Given data and esti- 
mated model parameters, let t m be the time midway between the 
first and last observations, and E m the eccentric anomaly at t m . 
Now increase / by a small amount. This can be balanced by a 
small decrease of \ ( or > equivalently, by increasing the time of 
periapsis T) such that the eccentric anomaly at f m again equals 
E m . This contrarious change of / and x, as opposed to leaving 
only one of them altered, will make E deviate less, on average, 
over the total timespan; i. e. the model will fit the data better. 
The relation between / and^- so introduced is indicated by their 
negative correlation coefficient. 

As another example, the highly negative correlation coeffi- 
cient cl>2 and Q can be understood by reference to Eqs. 30 - 33, 
which describes the Thiele-Innes constants of the AM model as 
follows. In the case of edge-on orbits (inclination i = 0), these 
expressions simplify to sums or differences of cos(-) and sin(-) 
functions, the arguments being o) 2 + O or u> 2 - £2, with both ar- 
guments appearing equally often. An additional simplification is 
observed for face-on orbits (i = |), where the Thiele-Innes con- 





n 2n 

M 



Figure 8. All data types: from top to bottom: AM data and model (S co- 
ordinate); AM residuals AS; AM data and model (a cos 6 coordinate); 
AM residuals A(a cos 6); RV data and model v (primary: solid line, sec- 
ondary: dotted line, RV offset V: horizontal dashed line); RV residu- 
als Av (primary: normal error bars, secondary: dashed error bars, model: 
dashed line). AM error bars are defined as the sides of the smallest rect- 
angle orientated along the coordinate axes and containing the respective 
error ellipse. Abscissa values are mean anomalies M (Eq. 26), i. e. times 
folded with respect to the MAP estimates of the time of periapsis T and 
period P. 



stants are A — G — - cos(£2 + (02) and B = -F = - sin(Q + a>2). 
Thus, for orbits nearly face-on, the strong appearance of the sum 
Q. + C02 introduces a negative correlation between the two angles, 
because their contrarious change can lead to the same value of 
the model function. For Doppler-spectroscopic data, this is coun- 
teracted by the fact that the RV model function does not contain 
£1 

Models and residuals. Figure 7 presents all data with the mod- 
els calculated from the MAP estimates listed in Table B.l, as 
well as residual vectors and error ellipses for astrometry. While 
this analysis used both the older (Hummel et al. 1995) and newer 
(Hummel et al. 1998) AM in combination with RV data (Prevot 
1961), the AM model is very similar to the one calculated by 
Hummel et al. (1998, Fig. 11) using only the newer AM data 
because of the overall agreement in parameters. 
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a, [AU] a 2 [AU] a rcl [AU] 



Figure 5. Marginal posteriors of parameters e, f,x, o*i, i, CI, vj, V, Ki, K 2 (see also Table 1) and derived quantities 
P, T, d,p, Kz\ t j, Ka\ U 2, mi, m 2 , a.\, a 2 , fl re i (Table 2) with marginal-posterior medians (dotted line) and 68.27% HPDIs (dashed lines), from the 
final pass. Abscissae ranges are identical with the corresponding 95% HPDIs. Ordinate values were omitted but follow from normalisation. Open 
triangles on the upper abscissae indicate the MAP estimates. Open circles on the lower abscissae bound the confidence intervals given by or 
derived from Prevot (1961), while open triangles refer to the intervals of Hummel et al. (1998). Some of these literature estimates are not plotted 
because they are outside the abscissa range; for / and^, no literature uncertainty estimate is available. For derivations and numerical values, see 
Tables B.l and B.2. 

Table 8. p-values of the Kolmogorov-Smirnov statistic." 



Data type p-value 

AM 0.00098 

RV 0.75139 

All data 0.00331 



Notes. (a) These values are defined with respect to the final residuals 
and a standard normal distribution. 

mode. Again, due to the similarity in parameters, the folded RV 
plot in Fig. 8 is very similar to the corresponding figure in Prevot 
(1961). 

To assess the distribution of the residuals and compare it to 
a normal distribution, thus checking the validity of our noise 
model, we normalised the residuals of both data types as follows. 
For AM, we defined the normalised residual as a signed version 
of the Mahalanobis distance (Mahalanobis 1936) between the 
observed and the modelled values, 

PAM,i = Si Jiri-MtiWEiHri-Mti)), (61) 
with sign 

Si = sgn ((<pi - tfi)(n - [fa - (fiJ)) , (62) 
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Figure 9. Distribution of normalised residuals of AM data (long-dashed 
line), RV data (short-dashed) and all data (solid), along with the stan- 
dard normal distribution A/X0, 1) (dotted). For definitions of the nor- 
malised residuals, see Eqs. 61 - 65. 



Figure 8 shows all data, models and residuals, separately 
by coordinate for AM. The abscissa corresponds to the mean 
anomaly M (Eq. 26), i. e. the plots are folded with respect to a 
time of periapsis T and period P corresponding to the posterior 
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<■>-, 



a 










/ 



X 



0)2 



Q. 



Figure 6. Two-parameter joint marginal posterior densities fij(-, •) from pass E. The figure consists of 45 sub-plots, one for each combination of 
two parameters. Black denotes highest density. The inner and outer contours contain 50% and 64.86% probability, respectively. All plots aligned in 
one column share the same abscissa, denoted on the bottom; all plots aligned in one row share the same ordinate, denoted on the left. All abscissae 
and ordinates are displayed over the corresponding 95% HPDIs (Table B.l), such that the total probability content of each plot is between 90% 
and 95%. 



where (p t and 0, are the position angles of the residual and of the 
uncertainty ellipse (see Section 2.1), respectively. This definition 
allows us to write, according to Eq. 8, 

Nam 

XaM ~ 2j ^AM,i 
i=l 

For RV data, we define 
Vi - v(0; td 



(63) 



£>RV,i = 



(64) 



which analogously yields 



2 

/Try 



Nrv 



(65) 



The distributions of normalised residuals of both data types in- 
dividually as well as of all data, estimated using kernel density 
estimation (Section 3.4), are shown in Fig. 9. Table 8 lists the p- 
values of the Kolmogorov-Smirnov statistic, which relates each 
distribution to the standard normal distribution JV(0, 1). 
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Figure 7. AM and RV data and models calculated with the MAP parameter estimates (Table B.l). a) AM data (uncertainty ellipses, solid lines), 
model (large ellipse, solid line) and residual vectors (dashed lines), b) Residual AM error ellipses, c) RV data and model for primary (normal error 
bars, solid line) and secondary (hourglass-shaped error bars, dotted line). The horizontal dashed line indicates the RV offset V. RV residuals are 
presented in Fig. 8. 



The p- value equals the probability, under the hypothesis 
that the normalised residuals are randomly drawn from N(0, 1), 
to observe a distribution of normalised residuals that differs at 
least as much from N(0, 1) as is actually the case. This dif- 
ference is quantified here by the Kolmogorov-Smirnov statistic. 
Denoting the hypothesis of such an observation by "K bs> the p- 
value can be expressed as pCW bs We note that according 
to the Bayes theorem, this is not equal to the "inverse" probabil- 
ity pCHul'Hobs) °f the residuals coming from the normal distri- 
bution, given the observation. 

For the AM residuals, as well as for all residuals, a heavy- 
tailed distribution (Fig. 9) is observed and the low /rvalue indi- 
cates a minute probability of such an observation under Hu, i. e. 
the normalised residuals comply poorly with the standard normal 
distribution A/'(0, 1). This reflects the fact that several AM data 
points are outliers with respect to the observable and noise model 
(Fig. 7). We interpret this in terms of systematic effects in the 
measurements that are not contained in our noise model. In con- 
trast, the normalised RV residuals are more normally-distributed, 
and there are no such severe outliers (Fig. 7 and Fig. 8). 

According to the principle of maximum entropy, given only 
the mean and variance of a distribution, the normal distribution 
has maximum information-theoretic entropy, equivalent to min- 
imum bias or prejudice with respect to the missing information 



(e. g. Kapur 1989). Still, it is well-known that this widely used 
noise model is relatively prone to outliers; Lange et al. (1989) 
have suggested to replace it by a non-standardised f-distribution, 
resulting in the down-weighting of outliers. This distribution can 
be derived from the normal distribution under the assumption 
that the noise variance is unknown with a certain probability dis- 
tribution. The f-distribution has an additional unknown degrees- 
of-freedom parameter veR; for v = 1, it resembles the Cauchy- 
distribution. 

In contrast, our approach has been to regard every datum 
with its standard deviation as accurate, which also implies that 
we have not discarded any data as outliers. Under this assump- 
tion, deviating from the maximum-entropy principle by select- 
ing a different distribution introduces prior "knowledge" that we 
may not actually have and thus potentially biases the results. 

6. Conclusions 

We have presented BASE, a novel and highly configurable tool 
for Bayesian parameter and uncertainty estimation with respect 
to model parameters and additional derived quantities, which 
can be applied to AM as well as RV data of both exoplanet 
systems and binary stars. With user-specified or uninformative 
prior knowledge, it employs a combination of Markov chain 
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Monte Carlo (MCMC) and several other techniques to explore 
the whole parameter space and collect samples distributed ac- 
cording to the posterior distribution. We presented a new, simple 
method of refining the window width of one-dimensional kernel 
density estimation, which is used to derive marginal posterior 
densities. 

We derived the observable models from Newton's law of 
gravitation, neglecting the motion of the observer and the target 
system, which we showed is justified in the case of Mizar A. 
After sketching how we estimated the RV uncertainties that 
are missing in the original publication (Prevot 1961), we de- 
tailed our analysis of all publicly available AM and RV data 
of Mizar A. It consists of five consecutive stages and has pro- 
duced estimates of the values, uncertainties, and correlations of 
all model parameters and derived quantities, as well as marginal 
posterior densities over one and two dimensions. 

As illustrated in Fig. 5 and Table B.l, our new results exhibit 
overall compatibility with previous literature values; this is also 
the case for the models in Fig. 7, whose plots differ only slightly 
from those published earlier. Several outliers in the AM data are 
visible in the distribution of the corresponding normalised resid- 
uals, which deviates significantly more from a standard normal 
distribution than that of the RV residuals. Nevertheless, it is not 
necessary to remove outliers for our program to finish success- 
fully. 

In the near future, we plan to apply BASE to a potential exo- 
planet host star. In this study one of the aims will be to determine 
the existence probability of a planetary companion. 



If the lower prior bound of a scale parameter is zero, e. g. for 
the RV semi-amplitude K, a modified Jeffreys prior is used. 
It has the form 



p(0|M,ft) = 



0(0 - a) ®(b - 0) 
(0 + k )ln^ 



(A.5) 



where 0k is the knee of the prior. For <*c 0k, this prior is 
approximately uniform, while it approaches a Jeffreys prior 
for » 0k. 



Appendix B: Numerical posterior summaries 

The following tables list the numerical values of several pos- 
terior summaries. Those in Tables B.l and B.2 are derived from 
the marginal posteriors (Fig. 5), while the correlation coefficients 
in Table B.3 reflect linear relations between parameters and, in 
this respect, can be regarded as summaries of the joint marginal 
posteriors (Fig. 6). 

Compared to the underlying densities, all of these summaries 
are incomplete. Still, they are useful e. g. for the calculation of 
model functions or comparison with literature results. 
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Appendix A: Encoding prior knowledge 

By means of the prior, Bayesian analysis allows one to incorpo- 
rate knowledge obtained earlier, e. g. using different data. When 
no prior knowledge is available for some model parameter, ex- 
cept for its allowed range, maximum prior ignorance about the 
parameter can be encoded by a prior of one of the following 
functional forms for the most common classes of location and 
scale parameters (Gregory 2005b; Sivia 2006). 

- For a location parameter, we demand that the prior be invari- 
ant against a shift A in the parameter, i. e. 

p(0| At, <K) d0 = p(0 + A| At, <K) d(0 + A), ( A. 1 ) 

which leads to the uniform prior 

m M,K > - efe -"> e( >- a > , (A.2) 
b — a 

where &(-),a, and b are the Heaviside step function, the 
lower and the upper prior bounds. 

Here, we note that the frequentist approach, lacking an ex- 
plicit definition of the prior, corresponds to the implicit as- 
sumption of a uniform prior for all parameters. 

- A positive scale parameter, which often spans several 
decades, is characterised by its invariance against a stretch 
of the coordinate axis by a factor <p, i. e. 

p(0|At, 7C) d0 = p(^0|At, K) d(<p6), (A.3) 

which is solved by the Jeffreys prior, 

P^IM^ ^?^ (A.4) 
01n^ 

a 

That a uniform prior would be inappropriate for this param- 
eter is also illustrated by the fact that it would assign higher 
probabilities to lying in a higher decade of [a, b] than in a 
lower. 
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Table B.l. Model parameters: new and previous results." 



Estimate 


e 


JT 

J 


X 


Cl>2 


i 


12 


UJ 


V 




K-2 


Reference 






(d ') 




(rad) 


(rad) 


(rad) 


(mas) 


n -Is 

(kms ) 


(kms ) 


/I -1\ 
(kms ) 







0.5304 


0.0486 89388 


0.93322 


4.9771 


1.0530 


1.8381 


38.74 


-6.02 


58.84 


57.16 







0.5299 


0.0486 89403 


0.93315 


4.9779 


1.0528 


1.8374 


38.69 


-6.20 


58.60 


57.39 







0.5295 


0.0486 89409 


0.93294 


4.9784 


1.0527 


1.8365 


38.91 


-6.04 


58.41 


56.97 




9 


0.5281 


0.0486 89438 


0.93244 


4.9807 


1.0515 


1.8346 


39.02 


-6.02 


58.21 


56.83 




. 


0.5282 


0.0486 89341 


0.93227 


4.9751 


1.0513 


1.8345 


38.13 


-6.93 


57.14 


55.47 




-'HPD, 50% 


0.5317 


0.0486 89467 


0.93379 


4.9806 


1.0549 


1.8397 


39.66 


-5.21 


60.21 


58.47 






0.5270 


0.0486 89295 


0.93175 


4.9735 


1.0500 


1.8327 


37.74 


-7.31 


56.18 


54.69 




-*HPD, 68.27% 


0.5326 


0.0486 89509 


0.93430 


4.9825 


1.0558 


1.8411 


40.06 


-4.68 


60.84 


59.23 




^HPD, 95% 


0.5152 


0.0486 88907 


0.92550 


4.9653 


1.0383 


1.8165 


35.73 


-9.96 


51.77 


50.35 




0.5380 


0.048690031 


0.93860 


5.0019 


1.0615 


1.8485 


42.24 


-2.08 


64.54 


63.16 




^HPD, 99% 


0.4905 


0.0486 88483 


0.91472 


4.9530 


1.0134 


1.7790 


34.24 


-12.40 


46.87 


45.66 




0.5451 


0.0486 90874 


0.94285 


5.0423 


1.0699 


1.8593 


45.02 


0.22 


67.05 


66.49 




Lit. estimate 


0.537 


0.0486 8881* 


0.93487* 


4.9595 








-5.64 


58.04 d 


57.03 d 


(1) 


Uncertainty 


0.004 


c 


C 


0.0201 








0.15 


0.70 


0.80 


Lit. estimate 


0.5354 


0.0486 89403* 


0.93524* 


4.9620 


1.0559 


1.850 


39.4 




58.33 e 


56.69 e 


(2) 


Uncertainty 


0.0025 


0.000000119 


0.00097 


0.0052 


0.0052 


0.007 


0.3 




2.40 


2.33 



Notes. <a> For definitions of the estimates, see Section 2.2.2. <fc) Literature values and uncertainties are calculated from t t and the original parameters 
P and T according to Table 2. (c) Uncertainties are missing for P in Prevot (1961) and thus for / and^. id) K t and K 2 are derived from K a[t [ and 
-ffaia via e according to Table 2. (e) Ki and K 2 are derived from P, a' Kl , cr, p and ( using Eqs. 40, 44 and 47. 

References. (1) Prevot (1961); (2) Hummel et al. (1998) 



Table B.2. Derived quantities: new and previous results." 



Estimate 


P 


y b 


d 


P 


^alt,l 


^alt,2 


nil 


m 2 


ay 


(12 


a rcl 


Reference 




(d) 


(d) 


(pc) 




(kms" 1 ) 


(kms" 1 ) 


(M Q ) 


(M Q ) 


(AU) 


(AU) 


(AU) 




6 


20.53 8350 


36997.247 


25.74 


1.028 


68.70 


68.05 


2.477 


2.500 


0.12657 


0.12418 


0.25074 




e 


20.53 8347 


36997.251 


25.69 


1.025 


68.85 


67.15 


2.459 


2.517 


0.12678 


0.12353 


0.25068 




e 


20.53 8335 


36997.262 


25.66 


1.027 


68.56 


66.93 


2.455 


2.521 


0.12658 


0.12393 


0.25017 






20.53 8322 


36997.234 


25.18 


0.986 


67.30 


65.39 


2.320 


2.311 


0.12331 


0.11742 


0.24628 




^HPD, 50% 


20.53 8375 


36997.265 


26.20 


1.061 


70.90 


68.91 


2.609 


2.689 


0.13051 


0.12987 


0.25555 






20.53 8305 


36997.223 


24.94 


0.969 


66.26 


64.48 


2.238 


2.228 


0.12156 


0.11385 


0.24388 




^HPD, 68.27% 


20.53 8395 


36997.276 


26.47 


1.082 


71.71 


69.82 


2.678 


2.809 


0.13269 


0.13293 


0.25803 






20.53 8085 


36997.135 


23.46 


0.866 


60.95 


59.10 


1.827 


1.775 


0.11252 


0.10058 


0.22977 




^HPD, 95% 


20.53 8558 


36997.404 


27.72 


1.187 


75.99 


74.19 


3.051 


3.235 


0.14028 


0.14681 


0.26908 






20.53 7729 


36997.044 


21.98 


0.751 


54.61 


53.79 


1.475 


1.463 


0.10452 


0.08958 


0.21512 




^HPD, 99% 


20.53 8737 


36997.622 


28.92 


1.305 


78.41 


78.28 


3.415 


3.641 


0.14637 


0.16321 


0.27939 




Lit. estimate 


20.53 860 


36997.212 




1.018 


68.80 


67.60 












(1) 


Uncertainty 




0.022 




0.018 


0.79 


0.91 












Lit. estimate 


20.53 835 


36997.20 


25.38 


1.029 


69.06 


67.13 


2.43 


2.50 


0.12652 


0.12297 


0.24949 


(2) 


Uncertainty 


0.00005 


0.03 


0.19 


0.041 


3.84 


3.74 


0.07 


0.07 


0.00519 


0.00504 


0.00205 



Notes. (a) For definitions of the estimates, see Section 2.2.2. <b) T is given in the reduced Julian date scale, i. e. as Julian Date - 2.4 • 10 6 d. 



References. (1) Prevot (1961); (2) Hummel et al. (1998) 

Table B.3. Pearson correlation coefficients of pairs of parameters. 





e 


/ 


X 


a>2 


i 


Q. 


m 


V 




/ 


-0.21017 


















X 


0.35120 


-0.43922 
















i 


-0.62560 
0.52716 


0.18818 
-0.11723 


-0.35074 
0.28022 


-0.57440 












a. 


0.60484 


-0.22616 


0.34159 


-0.62355 


0.47121 












0.02224 


0.04643 


-0.03317 


-0.02066 


0.05501 


-0.00134 








V 


-0.01699 


0.02057 


-0.02176 


0.01468 


-0.01012 


-0.01436 


0.00752 








0.11517 


-0.07289 


0.09702 


-0.12194 


0.09822 


0.11416 


-0.33947 


0.00868 




K 2 


0.04381 


-0.03506 


0.04188 


-0.04901 


0.03520 


0.04618 


-0.32930 


-0.02639 


0.02274 
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